362 research outputs found

    Visual Data Mining

    Get PDF
    Occlusion is one of the major problems for interactive visual knowledge discovery and data mining in the process of finding patterns in multidimensional data.This project proposes a hybrid method that combines visual and analytical means to deal with occlusion in visual knowledge discovery called as GLC-S which uses visualization of n-D data in 2D in a set of Shifted Paired Coordinates (SPC). A set of Shifted Paired Coordinates for n-D data consists of n/2 pairs of common Cartesian coordinates that are shifted relative to each other to avoid their overlap. Each n-D point A is represented as a directed graph A* in SPC, where each node is the 2D projection of A in a respective pair of the Cartesian coordinates. The proposed GLC-S method significantly decrease cognitive load for analysis of n-D data and simplify pattern discovery in n-D data. The GLC-S method iteratively splits n-D data into non-overlapping clusters (hyper-rectangles) around local centers and visualizes only data within these clusters at each iteration. The requirements for these clusters are to contain cases of only one class and be the largest cluster with this property in SPC visualization. Such sequential splitting allows: (1) avoiding occlusion, (2) finding visually local classification patterns, rules, and (3) combine local sub-rules to a global rule that classifies all given data of two or more classes. The computational experiment with Wisconsin Breast Cancer data(9-D), User Knowledge Modeling data(6-D), and Letter Recognition data(17-D) from UCI Machine Learning Repository confirm this capability. At each iteration, these data have been split into training (70%) and validation (30%) data. It required 3 iterations in Wisconsin Breast Cancer data, 4 iterations in User Knowledge Modeling and 5 iterations in Letter Recognition data and respectively 3, 4, 5 local sub-rules that covered over 95% of all n-D data points with 100% accuracy at both training and validation experiments. After each iteration, the data that were used in this iteration are removed and remaining data are used in the next iteration. This removal process helps to decrease occlusion too. The GLC-S algorithm refuses to classify remaining cases that are not covered by these rules, i.e.,., do not belong to found hyper-rectangles. The interactive visualization process in SPC allows adjusting the sides of the hyper-rectangles to maximize the size of the hyper-rectangle without its overlap with the hyper-rectangles of the opposing classes. The GLC-S method splits data using the fixed split of n coordinates to pairs. This hybrid visual and analytical approach avoids throwing all data of several classes into a visualization plot that typically ends up in a messy highly occluded picture that hides useful patterns. This approach allows revealing these hidden patterns. The visualization process in SPC is reversible (lossless). i.e.,., all n-D information is visualized in 2D and can be restored from 2D visualization for each n-D case. This hybrid visual analytics method allowed classifying n-D data in a way that can be communicated to the user’s in the understandable and visual form

    Exploratory topic modeling with distributional semantics

    Full text link
    As we continue to collect and store textual data in a multitude of domains, we are regularly confronted with material whose largely unknown thematic structure we want to uncover. With unsupervised, exploratory analysis, no prior knowledge about the content is required and highly open-ended tasks can be supported. In the past few years, probabilistic topic modeling has emerged as a popular approach to this problem. Nevertheless, the representation of the latent topics as aggregations of semi-coherent terms limits their interpretability and level of detail. This paper presents an alternative approach to topic modeling that maps topics as a network for exploration, based on distributional semantics using learned word vectors. From the granular level of terms and their semantic similarity relations global topic structures emerge as clustered regions and gradients of concepts. Moreover, the paper discusses the visual interactive representation of the topic map, which plays an important role in supporting its exploration.Comment: Conference: The Fourteenth International Symposium on Intelligent Data Analysis (IDA 2015

    Influence of vaccine-preventable diseases and HIV infection on demand for an infectious diseases service in Rio de Janeiro State, Brazil, over 22 years – Part II (1995-2016)

    Get PDF
    Patients’ data during daily clinical care are extremely important for improving the allocation of healthcare resources and for assessing healthcare demands. The prospective gathering of these data over decades allowed us to describe the trends of infectious diseases in a tertiary hospital. The results concerning the period between 1965 and 1994 described the exponential increase in the incidence of HIV infection and its important effects on our institutional mortality. The present study describes the demand for the same hospital between 1995 and 2016. There were 4,691 admissions and the main causes of admissions were, in descending order, HIV infection (1,312, 28.0%), noninfectious diseases (447, 9.5%), meningoencephalitis (432, 9.2%), soft tissue infections (427; 9.1%), tuberculosis (272, 5.8%), pneumonias (212, 4.5%) and leptospirosis (212, 4.5%). There were 864 readmissions; most due to HIV infections (65.2%). The institutional mortality fell from 16.9% in the first two years to 5.0% in the last two years of the study. The case-fatality rates among the HIV patients decreased from more than 40% to approximately 5% over the study period. In the last two decades, the hospital experienced a decrease in demand due to vaccine-preventable diseases. The demand for children has fallen and the demand for patients over the age of 50 has increased. These results reflect the improvement in public health standards over more than half a century and the positive effects of the National Immunization Program. They also illustrate the sharp decline in the HIV case-fatality rate after the introduction of combined antiretroviral therap

    Visual Analytics for Network Security and Critical Infrastructures

    Get PDF
    A comprehensive analysis of cyber attacks is important for better understanding of their nature and their origin. Providing a sufficient insight into such a vast amount of diverse (and sometimes seemingly unrelated) data is a task that is suitable neither for humans nor for fully automated algorithms alone. Not only a combination of the two approaches but also a continuous reasoning process that is capable of generating a sufficient knowledge base is indispensable for a better understanding of the events. Our research is focused on designing new exploratory methods and interactive visualizations in the context of network security. The knowledge generation loop is important for its ability to help analysts to refine the nature of the processes that continuously occur and to offer them a better insight into the network security related events. In this paper, we formulate the research questions that relate to the proposed solution

    The four faces of information visualization: A conceptual framework for a postgraduate program

    Get PDF
    The multidisciplinary nature of information visualization is today fairly consensual in both professional and academic communities: data analysis, information design, storytelling, among other subjects, are common drivers in this field. The systematic study of this cross-fertilization, patent in the way the concept's definition varies according to the perspective being adopted, represents an important and needed addition to the critical mass of a relatively recent area of knowledge. The proposal of a single unified definition of information visualisation being beyond the scope of this paper, it instead summons and discusses its multiple viewpoints to help designing a postgraduate program on the topic, aiming to simultaneously start an open debate as its implementation phase goes on and new questions are subsequently raised.info:eu-repo/semantics/acceptedVersio

    Evaluation of two interaction techniques for visualization of dynamic graphs

    Full text link
    Several techniques for visualization of dynamic graphs are based on different spatial arrangements of a temporal sequence of node-link diagrams. Many studies in the literature have investigated the importance of maintaining the user's mental map across this temporal sequence, but usually each layout is considered as a static graph drawing and the effect of user interaction is disregarded. We conducted a task-based controlled experiment to assess the effectiveness of two basic interaction techniques: the adjustment of the layout stability and the highlighting of adjacent nodes and edges. We found that generally both interaction techniques increase accuracy, sometimes at the cost of longer completion times, and that the highlighting outclasses the stability adjustment for many tasks except the most complex ones.Comment: Appears in the Proceedings of the 24th International Symposium on Graph Drawing and Network Visualization (GD 2016

    Goal-Based Selection of Visual Representations for Big Data Analytics

    Get PDF
    The H2020 TOREADOR Project adopts a model-driven architecture to streamline big data analytics and make it widely available to companies as a service. Our work in this context focuses on visualization, in particular on how to automate the translation of the visualization objectives declared by the user into a suitable visualization type. To this end we first define a visualization context based on seven prioritizable coordinates for assessing the user's objectives and describing the data to be visualized; then we propose a skyline-based technique for automatically translating a visualization context into a set of suitable visualization types. Finally, we evaluate our approach on a real use case excerpted from the pilot applications of TOREADOR
    corecore